Cross-cloud deployments

ABSTRACT

Systems and methods are provided for managing a distributed database across multiple cloud provider systems. Database elements (e.g., primary, secondary, and/or read-only nodes) are distributed across multiple cloud provider systems. A provisioning component is configured to enable cross-cloud configuration options to specify the manner in which the clusters/replica set members are to be deployed across multiple cloud providers and/or geographical regions.

RELATED APPLICATIONS

This application claims priority under 35 U.S.C. § 120 to and is acontinuation of U.S. patent application Ser. No. 17/342,236, entitled“CROSS-CLOUD DEPLOYMENTS”, filed Jun. 8, 2021, which claims the benefitunder 35 U.S.C. § 119(e) of U.S. provisional Patent Application Ser. No.63/036,205 entitled “CROSS-CLOUD DEPLOYMENTS,” filed Jun. 8, 2020, eachof which is herein incorporated by reference in their entirety.

FIELD OF THE INVENTION

The present invention relates to providing cross-cloud architectures fora distributed database system.

BACKGROUND

A number of conventional database systems exist that implement large andscalable database architectures. A variety of database architectures canbe selected and tailored to specific data requirements (e.g., largevolume reads, high data availability, no data loss, etc.). As the numberof systems that support the various architectures increase, thecomplexity of the database system likewise increases. In some settings,management of the database system becomes as complex as the architectureitself, and can overwhelm administrators who need to make changes onlarge distributed databases. Further, the design phase of suchimplementations is rife with error, inconsistency, and conflict. Asdistributed databases integrate cloud services and virtualarchitectures, these problems are magnified.

SUMMARY

Various embodiments relate to creating a distributed database thatcrosses between cloud providers. A custom architecture is provided thatmaintains connections (e.g., secure) between database elements (e.g.,MONGODB™ nodes (e.g., primary, secondary, arbiters, etc.)) distributedacross multiple cloud providers. In some examples, architecting adistributed database across cloud providers makes the resulting systemmore fault tolerant, as catastrophic failures in multiple locations andover multiple cloud providers would have to occur to render thedistributed database unavailable. Such a cross-cloud architecture allowsfor better resource utilization and allocation.

BRIEF DESCRIPTION OF THE DRAWINGS

Various aspects of at least one embodiment are discussed herein withreference to the accompanying figures, which are not intended to bedrawn to scale. The figures are included to provide illustration and afurther understanding of the various aspects and embodiments, and areincorporated in and constitute a part of this specification, but are notintended as a definition of the limits of the invention. Where technicalfeatures in the figures, detailed description or any claim are followedby references signs, the reference signs have been included for the solepurpose of increasing the intelligibility of the figures, detaileddescription, and/or claims. Accordingly, neither the reference signs northeir absence are intended to have any limiting effect on the scope ofany claim elements. In the figures, each identical or nearly identicalcomponent that is illustrated in various figures is represented by alike numeral. For purposes of clarity, not every component may belabeled in every figure.

FIGS. 1-4 are example screen captures of user interfaces forcross-region deployments, according to some embodiments of thetechnology described herein;

FIG. 5 is a block diagram of an example distributed database system,according to some embodiments of the technology described herein;

FIGS. 6A-6B are example screen captures of user interfaces forsingle-cloud deployments, according to some embodiments of thetechnology described herein;

FIG. 7 is an example screen capture of a user interface for across-cloud deployment, according to some embodiments of the technologydescribed herein;

FIGS. 8A-8D are additional example screen captures of user interfacesfor cross-cloud deployments, according to some embodiments of thetechnology described herein;

FIGS. 9A-9C are additional example screen captures of user interfacesfor cross-cloud deployments, according to some embodiments of thetechnology described herein;

FIGS. 10A-10C are example screen captures of user interfaces enablingconfiguration changes for a cross-cloud deployment, according to someembodiments of the technology described herein;

FIG. 11 is a block diagram of an example database system on whichvarious aspects of the technology described herein can be practiced;

FIG. 12 is a block diagram of an example database system on whichvarious aspects of the technology described herein can be practiced; and

FIG. 13 is a block diagram of an example database system on whichvarious aspects of the technology described herein can be practiced.

DETAILED DESCRIPTION

According to one aspect, various management functions of a distributeddatabase can be facilitated and/or automated to eliminate errors inconfiguration, reduce downtime, and reduce the requirements forarchitecting solutions involving updates, healing operations, datatransitions, etc. According to various embodiments, distributeddatabases can include automation agents, backup agents, and/ormonitoring agents configured to cooperate with a central managementserver. In some implementations, the automation agents, backup agents,and/or monitoring agents execute on the various components of thedistributed database and provide status information, utilizationinformation, alerting, etc. to the central management server. In suchimplementation the central management server operates as theintelligence for identifying and/or triggering automation functions.

It is realized that enhancing the automation agents at the databaselevel can improve over conventional approaches. For example, anintelligent agent can identify and rectify an issue on a databasesystem, sometimes before a conventional implementation can even reportthe issue. Further, including proxies or caching of common problems andfor example execution plans, and corresponding binaries or otherapplications, enables the distributed system to retrieve commonsolutions across multiple distributed automation agents with lesslatency and potentially with less bandwidth consumption. In variousexamples, increasing the processing capability of the distributedautomation agents reduces database downtime, and reduces time of errorconditions existing on the database. Patent application Ser. No.14/969,537 entitled “SYSTEMS AND METHODS FOR AUTOMATING MANAGEMENT OFDISTRIBUTED DATABASES,” filed on Dec. 15, 2015, describes examples ofautomation agents and example implementation and is incorporated byreference herein in its entirety.

Improving automation functions locality can also improve operationalcharacteristics of a distributed database that spans multiple cloudproviders. Typically, supporting applications, databases, web-services,etc. with cloud resources hosted by different cloud providers is rifewith errors, lost connections, and increased network latency. In such asetting, local automation agents resident on respective cloud providerscan handle the majority of the automation tasks without having to crossthe cloud provider boundary.

According to another aspect, cross-provider architectures are enabledwithin a distributed database. Conventional databases provide optionsfor implementing database architecture in the cloud. It is realized thatbetter resource utilization and allocation can be achieved if differentcloud provider systems are utilized. Currently, significant hurdlesexist in creating a distributed database that crosses between cloudproviders. Various embodiments provide a cross-cloud architecture thatmaintains connections (e.g., secure) between various database elements(e.g., MONGODB™ nodes (e.g., primary, secondary, arbiters, etc.)distributed across multiple cloud providers. In some examples,architecting a distributed database across multiple cloud providersmakes the resulting system more fault tolerant, as catastrophic failuresin multiple locations and over multiple cloud providers would have tooccur to render the distributed database unavailable.

In some examples, the components of the distributed database areconfigured to maintain cross cloud connections using heartbeat signalsbetween components. Configuration metadata can be used to identifycross-cloud channels and respective components can maintaincommunication to ensure a cross-cloud connection remains “live.” Inother embodiments, indirection layers are used to bridge connectionsbetween multiple cloud providers (e.g., a mapping layer may specifymappings (e.g., network mappings) that allow nodes of one cloud providerto communicate with nodes of another cloud provider). The indirectionlayer(s) may include a control plane for enabling communications betweenmultiple cloud providers and implement the mappings that inform thenodes on how to communicate (e.g., DNS, IP addresses to use, etc.)across the cloud providers. For example, the control plane may implementmappings of MONGODB™ nodes and each node's IP addresses. The indirectionlayer(s) can also be configured to execute any database operations(e.g., replication, routing, etc.) transparently to the end users. Insome examples, the user need not take any special action to implement oruse cross-cloud database deployments, rather the indirection layer(s)provide a universal interface that allows operation, communication,etc., abstracting away the fact that the database is provisioned bymultiple cloud providers. In some implementations, the indirectionlayer(s) may include an application driver (e.g., a MONGODB™ applicationdriver) that transparently connects to nodes in different cloudproviders to provide an abstracted experience. Various embodimentsinclude provisioning components configured to access multiple cloudproviders and instantiate resources to support cross cloud databasedeployments. Patent application Ser. No. 16/010,034 entitled “SYSTEMSAND METHODS FOR MANAGING A DATABASE BACK END AS A SERVICE,” filed onJun. 15, 2018, that describes examples of provisioning functions andsystem components, including patent application Ser. No. 15/627,613entitled “SYSTEMS AND METHODS FOR MANAGING DISTRIBUTED DATABASEDEPLOYMENTS,” filed on Jun. 20, 2017, are both incorporated by referenceherein in their entirety. Also, patent application Ser. No. 15/721,176entitled “LARGE DISTRIBUTED DATABASE CLUSTERING SYSTEMS AND METHODS,filed on Sep. 29, 2017, is incorporated by reference herein in itsentirety.

According to another aspect, a distributed database may be supported bya replication architecture. For example, the well-known MONGODB™database employs a replica set architecture to replicate database data.According to some embodiments, a replica set (also referred to herein asa database cluster) includes at least a primary node hosting a primarycopy of database data, and at least two secondary nodes hostingsecondary copies of the database data. Typically, writes are executed atthe primary node, the operations are logged and the secondary nodesapply the logged operations to their copy of the data.

According to one aspect, as described in detail in the patentapplication Ser. No. 15/627,613 entitled “SYSTEMS AND METHODS FORMANAGING DISTRIBUTED DATABASE DEPLOYMENTS,” filed on Jun. 20, 2017, acloud based system or database as a service system is configured tomanage the design and creation of a distributed database in the cloud,for example, by providing a provisioning interface via which an end usermay create an instantiation of the distributed database on one or morecloud providers. Example implementations of a distributed databaseincluding a shared architecture are discussed in patent application Ser.No. 15/654,590 entitled “SYSTEM AND METHOD FOR OPTIMIZING DATA MIGRATIONIN A PARTITIONED DATABASE,” filed on Jun. 20, 2017, which isincorporated by reference herein in its entirety. When accessing theprovisioning service, the end user can via a graphical user interface ora publicly accessible API, create their database cluster by providing aname for the cluster to create and/or a version of the databaseapplication. The end user can select one or more cloud providers, one ormore geographic regions for their database resources and specify aninstance size. The user interface is configured to display selectionsfor a replication factor (i.e., a number of nodes in a replica set),sharding, whether to enable automated backup services, and additionaldatabase configuration options. Once the selections are made, the systemcreates an instantiation of the user defined cluster on the cloudresources for a particular selected geographic region.

It is realized that enabling cross-region deployments for cloudproviders allows for improved disaster recovery and fault tolerance.Enabling cross-region deployments allows replica set members (e.g., athree node replica set) associated with a particular cloud provider tobe distributed across multiple geographic regions. In some embodiments,the provisioning service can be configured to enable cross-regionconfiguration options to specify the manner in which the replica setmembers are to be deployed across the multiple geographic regions. Inone implementation, the end user may, via the graphical user interface,make a selection to deploy replica set members in different geographicalregions. For example, the three node replica set (including a primarynode and two secondary nodes) may be distributed across two or threedifferent geographic regions. Each region may include a number ofindependent availability zones. Availability zones consist of one ormore discrete data centers, each with redundant power, networking, andconnectivity, housed in separate facilities. Clusters deployed inregions with two availability zones can be split across two availabilityzones, where a three node replica set cluster may have two nodesdeployed to one availability zone and the remaining node deployed to theother availability zone. Clusters deployed in regions with at leastthree availability zones can be split across three availability zones.For example, a three node replica set cluster may have one node deployedto each availability zone.

FIG. 1 shows a user interface 100 that allows a user to enablecross-region configuration options, for example, by clicking on the link“Enable cross-region configuration options” 102. As shown in FIG. 2 ,user interface 200 allows the user to select the nodes of the replicaset to be distributed across three regions. For example, user interface200 may include interface elements 202, 204, 206 enabling selection ofus-east-1 as the preferred region (e.g., the region that contains acurrent primary node of the replica set) and selection of us-east-2 andus-west-1 as electable regions (e.g., regions that contain the secondarynodes of the replica set). For example, FIG. 2 illustrates that of thethree nodes, a user has selected a first node to be deployed in a firstregion (e.g., preferred region), a second node to be deployed in asecond region (e.g., one of the electable regions), and a third node tobe deployed in a third region (e.g., the other electable region). Nodesin the electable regions participate in the election and automaticfailover process to determine which of the secondary nodes will functionas a new primary node if the current primary node fails. Such nodes mayalso be referred to as electable nodes. A configuration where replicaset members are distributed across different geographical regionsprovides for improved availability guarantees even in cases of regionaloutages because if a node in a first geographical region fails, anothernode in a second geographical region may be elected to provideuninterrupted service.

In some embodiments, the user interface 200 allows the user to selectdeployment options for read-only replica set members across differentgeographical regions for purposes of improving performance of localreads (e.g., by reducing read latency). The read-only replica setmembers/nodes do not participate in the election and failover process.As shown in FIG. 3 , user interface 300 allows the user to add (e.g.,via interface element 302) a read-only replica member to the eu-west-2region to serve local reads. FIG. 4 shows a configuration in which thereplica set members are distributed across three different geographicalregions and a read-only replica set member is added to yet anothergeographical region.

In some implementations, each of the regions can be configured with arespective virtual private cloud (VPC) architecture, where nodes withina region can communicate with one another via internal IPs. Cross-regioncommunication is provided via a control plane that implements networkmappings that allow nodes in one region to communicate with nodes inanother region.

It is also realized that enabling cross-cloud deployments furtherimproves disaster recovery and fault tolerance (in comparison tocross-region deployments) where clusters/replica set members aredistributed across multiple cloud providers (e.g., AWS, GCP, Azure)and/or geographical regions. For example, a regional outage caused by asoftware bug or defect in a cross-region deployment for a particularcloud provider may be easily replicated across other regions despite theregions being physically separate. Such failures can negatively impactservices provided to the end users. By contrast, cross-cloud deploymentsprovide protection against not only natural disasters but also humanerrors, such as, software bugs. For example, even if an entire cloudprovider fails as a result of being impacted by a software bug, anothercloud provider may be selected to provide services because it is highlyunlikely that the other cloud provider is also affected by the samesoftware bug. In addition, with a cross-cloud configuration, users canaccrue the benefits of leveraging a mix of resources provided bymultiple cloud providers (e.g., multi-cloud high availability, access tocloud-specific features, or new cloud regions) or move their data andtheir applications between cloud providers if they chose to.

In some embodiments, the provisioning service can be configured toenable cross-cloud configuration options to specify the manner in whichthe clusters/replica set members are to be deployed across multiplecloud providers and/or geographical regions. FIG. 5 is a block diagramof a cloud-based system 500 including a provisioning component, such as,provisioning service 502 and cloud provider systems 504, 506, 508. Aclient system or end user 510 can access provisioning service 502 tocreate a distributed database across multiple cloud providers systems504, 506, 508, via network 520. Cloud provider systems and cloudproviders may be used interchangeably herein. In some embodiments, eachof the cloud providers 504, 506, 508 may be a different cloud providersuch as, AWS of AMAZON, AZURE Cloud, GOOGLE Cloud, and/or any othercloud provider. It will be appreciated that although one client/end user510 is depicted in FIG. 5 , system 500 may support and include multipleclients/end users.

According to one aspect, system 500 can provide a number of userinterfaces or web-based platforms on which to access the system. A usercan access platform and/or interfaces and define database configurationsdesired (e.g., size of node, storage, number of replicas, shard (y/n)),for example through the provisioning service (e.g., 502) and associatedweb site. Based on the user specifications, system 500 may enablecreation, access and use of distributed database deployed across cloudproviders 504, 506, 508.

System 500 can include a number of application programming interfacesconfigured to connect with cloud providers 504, 506, 508, definedatabase configurations, provision cloud resources (e.g., networkingand/or machine resources) to support the database configurations,establish default security configurations (e.g., VPC (virtual privatecloud), TLS (transport layer security), and/or other data encryption,etc.), manage and apply networking rules and/or mapping to facilitatecommunication between cloud providers 504, 506, 508, capture databaseapplications or settings from existing systems, and identify and executeupdates and/or specific versions (e.g., associated with MONGODB™binaries), among other options.

In one implementation, an end user may, via a graphical user interface(e.g., interfaces 600, 610, 700, 800, 810, 820, 830, 920, 930, 940,1000, 1010), make a selection to deploy the clusters across multiplecloud providers. In some embodiments, the system 500 may provide an APIto allow users to create, modify, delete, and/or otherwise configure theclusters. In some embodiments, the system 500 may allow the user tocreate a new cluster or reconfigure an existing cluster to span multiplecloud providers. When a user adds additional regions to the cluster viauser interface and/or public API, the system 500 provides the user withoptions to specify which cloud provider and which geographic region thecorresponding new nodes are to be allocated. With a cross-cloud clusterconfiguration, data can be replicated across regions in multiple cloudproviders for latency or availability purposes. Clusters may also bedeployed across different cloud providers in the same geographicalregion for high-availability and data-redundancy, for example, within aparticular country.

FIGS. 6A and 6B illustrate user interfaces 600, 610 via which a userrequests creation of a cluster in the AWS cloud provider 612. Inresponse to user selection of interface element 620 “Create Cluster”,the provisioning service 502 may initiate creation of the cluster in theAWS cloud provider.

FIGS. 10A-10C illustrate user interfaces 1000, 1010, 1020 via which theuser requests reconfiguration of an existing cluster created in the AWScloud provider 1040. For example, the user may have previously created acluster in the AWS cloud provider via user interface 1000. The user maythen reconfigure the cluster to span multiple cloud providers via userinterface 1010 of FIG. 10B. As shown in FIG. 10B, a user may indicate,via an interface element 1045 (e.g., toggle), that a cross-cloud ormulti-cloud configuration is enabled. Interface elements 1060, 1070,1080 may be provided to allow selection of different cloud providers fordeploying nodes of the cluster. Additional interface elements 1062,1064, 1072, 1074, 1082, 1084 may be provided to allow selection ofrespective regions and nodes for each cloud provider. In someembodiments, in response to a selection of interface element 1090, userinterface 1020 of FIG. 10C may be generated and presented. Userinterface 1020 shows the changes between the original configuration andthe new configuration. In response to selection of interface element1092, the system 500 may apply the selected changes and create a clusterwith the new configuration.

In some embodiments, cross-cloud deployments retain the same level offlexibility as the cross-region deployments. For example, a number ofelection-participating nodes or electable nodes may be configured ineach individual region. In some embodiments, a primary node is electedfrom among the electable nodes. Priority of individual regions may bespecified to ensure that nodes from high priority regions areprioritized during the election process. Analytics or Read Only nodesmay be provisioned in any combination of regions and cloud providers.Analytics nodes are similar to read-only nodes but are provided forworkload isolation, where analytics based workload is isolated to thesenodes without impacting operational performance. Cloud provider snapshotbackups are provided for backup needs, and snapshots can be taken in thehighest priority region. For example, FIG. 7 illustrates user interface700 via which a user may request creation of a cluster which includes anumber of selected electable nodes in a particular region 710 of the AWScloud provider 702 and a number of selected read-only nodes in aparticular region 712 of the GCP cloud provider 704. FIGS. 9A, 9B, 9Cillustrate additional example user interfaces 920, 930, 940 via which auser may request creation of clusters by specifying configurations forcloud providers, regions, electable nodes, read-only nodes, analyticsnodes, and/or other configurations.

According to some aspects, provisioning service 502 can be configured toenable cross-cloud configuration options for global clusters. Globalclusters allow replica sets to be provisioned or deployed in differentgeographical regions called zones. For example, a zone may be configuredto contain a three node replica set distributed across availabilityzones of its preferred cloud region. In some embodiments, a globalcluster may be provisioned with multiple providers across differentzones. Zones can be configured to contain a mix of cloud providers or aparticular zone can be replicated across multiple cloud providers. Suchconfigurations allow placement and distribution of data to be managed incompliance with the strictest policies and data redundancy requirementswhile minimizing read latency (e.g., by keeping the data close to therequesting end users/clients). For example, FIGS. 8A, 8B, 8C, 8Dillustrate user interfaces 800, 810, 820, 830 via which a user mayrequest creation of global clusters by specifying configurations forzones, cloud providers, regions, electable nodes, read-only nodes,analytics nodes, and/or other configurations.

FIG. 5 illustrates an example cross-cloud deployment where a five-nodecluster/replica set is deployed across three different cloudproviders—three nodes 520, 522, 524 are deployed in a first cloudprovider 504, one node 526 is deployed in a second cloud provider 506,and one node 528 is deployed in a third cloud provider 508. Any numberof cloud compute resources (e.g., virtual machines or servers) may beassigned or used to execute the respective nodes in each cloud provider.Each cloud provider may include its own networking components or rulesthat enable communication and/or data transfer to/from the respectivenodes of the cloud provider. The cloud compute resources and/or othercomponents (e.g., other servers) of each cloud provider may manage datatransfer (e.g., reads, writes, queries, etc.) to/from the respectivenodes of the cloud provider. Although FIG. 5 illustrates a cross-clouddeployment for a distributed database supported by a replica set modelwith a replication factor of 5, other configurations may be used, suchas models with replication factor 3, 7, or any other replication factor.Patent application Ser. No. 15/074,987, entitled, METHOD AND APPARATUSFOR MAINTAINING REPLICA SETS, filed on Mar. 18, 2016 describes exampleof replica set and replica set models and is incorporated herein byreference in its entirety.

In some embodiments, in response to a request via client 510 (e.g., viathe user interfaces described herein, such as selection of “CreateCluster” interface element or “Apply Changes” interface element),provisioning service 502 can be configured to generate an instantiationof a distributed database on multiple cloud providers 504, 506, 508.Provisioning service 502 may communicate and/or coordinate withcomponents of cloud providers 504, 506, 508 to generate theinstantiation of the distributed database. Instantiation of thedistributed database may include allocation of the five-node replica set520, 522, 524, 526, 528 for database operations. Once instantiated,client 510 may connect to nodes deployed across different cloudproviders 504, 506, 508 to perform database operations (e.g., reads,writes, and/or other operations). Patent application Ser. No. 15/627,613entitled “SYSTEMS AND METHODS FOR MANAGING DISTRIBUTED DATABASEDEPLOYMENTS,” filed on Jun. 20, 2017, describes instantiation/deploymentof a distributed database on a single cloud provider via a proxy layerexecuted on the cloud provider. Such a proxy layer may be provided oneach of the cloud providers 504, 506, 508 to enable communication and/ordatabase operations with respective nodes of the cloud provider.

In some embodiments, indirection layers managed by provisioning service502 are used to bridge connections between multiple cloud providers. Theindirection layer(s) may include a control plane that implements networkmappings that allow nodes associated with one cloud provider and/orregion to communicate with nodes of another cloud provider and/orregion. In some implementations, Public IP Whitelisting may be used whenconnecting from one cloud provider to another cloud provider. In otherimplementations, a peered connection (e.g., VPC peering or VNet (VirtualNetwork) peering) may be used to communicate with nodes in a singlecloud provider. End clients may be able to connect to a subset of theirtopology over the peered connection.

According to some aspects, the provisioning service 502 providesintuitive interfaces (e.g., FIGS. 10A, 10B, 10C) for architecting andupdating cross-cloud configurations as and when needed. For example, anew cloud provider can be added to a cluster or migrated from one cloudprovider to another without downtime or any change to application code.As shown in FIG. 10C, for example, in response to a user selection ofinterface element 1092, provisioning service 502 may generate aninstantiation of the distributed database that spans multiple cloudproviders (e.g., AWS, Azure, GCP). According to some aspects,instantiation of the distributed database that spans multiple cloudproviders may include creation of networking containers for each of thecloud providers and creation of machines (e.g., virtual machines) onwhich the distributed database (e.g., MONGODB™ database) would execute.In the example of FIG. 10C, network containers and machines for Azureand GCP may be created because the corresponding components for AWSalready exist. Each networking container specifies a networkconfiguration for the respective cloud provider and includes networkingresources or components that enable communication and/or data transferto/from the respective cloud provider. One or more machines may becreated to execute the respective number of nodes in each cloudprovider. A cloud provider may assign an IP address to a machine createdfor the cloud provider.

In some embodiments, provisioning service 502 may manage and applynetworking rules and/or mapping to facilitate communication betweencloud providers 504, 506, 508. When creating networking containers foreach cloud provider, provisioning service 502 may ensure that networkingrules and/or mappings are appropriately applied between the networkingcontainers to avoid conflicting network configurations in thecross-cloud deployment. The networking rules manage access,communication, and/or data transfer to/from the individual cloudproviders and between the various cloud providers. The networking rulesmay include ingress/egress rules that manage communication and/or datatransfer between cloud providers. For example, a networking containerassociated with a cloud provider (e.g., Azure) may includeingress/egress rules that enable communication and/or data transferto/from nodes of another cloud provider (e.g., AWS, GCP). Provisioningservice 502 may perform appropriate checks to ensure that ingress/egressrules are properly configured such that only desired IP ranges and portsare allowed.

In some embodiments, communication and/or data transfer between cloudproviders may be managed using IP access lists. Each cloud provider maymaintain its own IP access list that enables control ofcommunication/data transfer to/from the cloud provider. In addition, amaster IP access list may be maintained that includes an up-to-datelisting of IP addresses for all the cloud providers (i.e., a combinationIP access list including the IP access list for each cloud provider). Insome implementations, the IP access list for each cloud provider may begenerated based on IP addresses assigned to the machines created byprovisioning service 502 for the respective cloud provider.

According to one aspect, updates or changes to the master IP access listmay be monitored to ensure that the updates or changes are propagated tothe appropriate cloud provider. For example, if the master IP accesslist includes entries for three cloud providers (Azure, GCP, AWS), andone or more entries for a first cloud provider are updated, the systemmay filter the list to ensure that the updated entries are propagated tothe other two cloud providers and not the first cloud provider. Ensuringthat the IP access lists across the different cloud providers areupdated to account for changes allows for seamless communication and/ordata transfer between the cloud providers. For example, if a usergenerates a write request that writes to a primary node of a replica setin a first cloud provider, the networking rules and configurations maybe used to transfer (replicate) the data to a secondary node in adifferent cloud provider. Similarly, if a user generates a read requestto read data, the networking rules and configurations may be used toread data from a secondary node in the different cloud provider.

In some embodiments, communication and/or data transfer between thecloud providers may be performed across the Internet but encrypted overthe wire with TLS (transport layer security). For example, communicationlink 525 may enable TLS encrypted communication between cloud providers504 and 506 and communication link 535 may enable TLS encryptedcommunication between cloud providers 506 and 508.

According to some aspects, the networking rules may include rule(s) thatprevent or disable peering connections between cloud providers. Peeringconnections may be allowed or enabled for individual cloud providers.Each cloud provider may offer peering that enables a tunnel/bridge to bebuilt between two independent VPCs and allows the tunnel to be traversedin a private connection between networking components for virtualmachines in the cloud provider.

The various processes described herein can be configured to be executedon the systems shown and described in the various patent applicationsincorporated by reference herein. The systems and/or system componentscan be programmed to execute the processes and/or functions described.Additionally, other computer systems can be specially configured toperform the operations and/or functions described herein. For example,various embodiments according to the present invention may beimplemented on one or more computer systems. These computer systems maybe, specially configured, general-purpose computers such as those basedon Intel Atom, Core, or PENTIUM-type processor, IBM PowerPC, AMD Athlonor Opteron, Sun UltraSPARC, or any other type of processor. It should beappreciated that one or more of any type computer system may be used tohost a database, a database replica, a database partition, a databaseshard, a database chunk and perform functions associated replica sets,data partitions, and shards as described in the various patentapplications incorporated by reference herein. Further, the computersystems can be configured to execute the processes discussed above formanaging a distributed database across multiple cloud provider systems.Additionally, any system may be located on a single computer or may bedistributed among a plurality of computers attached by a communicationsnetwork.

A general-purpose computer system can be specially configured asdisclosed herein. According to one embodiment of the invention thegeneral-purpose computer system is configured to perform any of theoperations and/or algorithms described herein. The operations and/oralgorithms described herein can also be encoded as software executing onhardware that define a processing component, that can define portions ofa general-purpose computer, reside on an individual general-purposecomputer, and/or reside on multiple general-purpose computers.

FIG. 11 shows a block diagram of an example general-purpose computersystem 900 on which various aspects of the present invention can bepracticed. For example, various aspects of the invention can beimplemented as specialized software executing in one or more computersystems including general-purpose computer systems 1104, 1106, and 1108communicating over network 1102 shown in FIG. 13 . Computer system 900may include a processor 906 connected to one or more memory devices 910,such as a disk drive, memory, or other device for storing data. Memory910 is typically used for storing programs and data during operation ofthe computer system 900. Components of computer system 900 can becoupled by an interconnection mechanism 908, which may include one ormore busses (e.g., between components that are integrated within a samemachine) and/or a network (e.g., between components that reside onseparate discrete machines). The interconnection mechanism enablescommunications (e.g., data, instructions) to be exchanged between systemcomponents of system 900.

Computer system 900 may also include one or more input/output (I/O)devices 902-904, for example, a keyboard, mouse, trackball, microphone,touch screen, a printing device, display screen, speaker, etc. Storage912, typically includes a computer readable and writeable nonvolatilerecording medium in which computer executable instructions are storedthat define a program to be executed by the processor or informationstored on or in the medium to be processed by the program.

The medium can, for example, be a disk 1002 or flash memory as shown inFIG. 12 . Typically, in operation, the processor causes data to be readfrom the nonvolatile recording medium into another memory 1004 thatallows for faster access to the information by the processor than doesthe medium. This memory is typically a volatile, random access memorysuch as a dynamic random access memory (DRAM) or static memory (SRAM).According to one embodiment, the computer-readable medium comprises anon-transient storage medium on which computer executable instructionsare retained.

Referring again to FIG. 11 , the memory can be located in storage 912 asshown, or in memory system 910. The processor 906 generally manipulatesthe data within the memory 910, and then copies the data to the mediumassociated with storage 912 after processing is completed. A variety ofmechanisms are known for managing data movement between the medium andintegrated circuit memory element and the invention is not limitedthereto. The invention is not limited to a particular memory system orstorage system.

The computer system may include specially-programmed, special-purposehardware, for example, an application-specific integrated circuit(ASIC). Aspects of the invention can be implemented in software,hardware or firmware, or any combination thereof. Although computersystem 900 is shown by way of example, as one type of computer systemupon which various aspects of the invention can be practiced, it shouldbe appreciated that aspects of the disclosure are not limited to beingimplemented on the computer system as shown in FIG. 11 . Various aspectsof the invention can be practiced on one or more computers having adifferent architectures or components than that shown in FIG. 11 .

It should be appreciated that the invention is not limited to executingon any particular system or group of systems. Also, it should beappreciated that the invention is not limited to any particulardistributed architecture, network, or communication protocol.

Various embodiments of the invention can be programmed using anobject-oriented programming language, such as Java, C++, Ada, or C#(C-Sharp). Other object-oriented programming languages may also beused. Alternatively, functional, scripting, and/or logical programminglanguages can be used. Various aspects of the invention can beimplemented in a non-programmed environment (e.g., documents created inHTML, XML or other format that, when viewed in a window of a browserprogram, render aspects of a graphical-user interface (GUI) or performother functions). The system libraries of the programming languages areincorporated herein by reference. Various aspects of the invention canbe implemented as programmed or non-programmed elements, or anycombination thereof.

Various aspects of this invention can be implemented by one or moresystems similar to system 900. For instance, the system can be adistributed system (e.g., client server, multi-tier system) comprisingmultiple general-purpose computer systems. In one example, the systemincludes software processes executing on a system associated withhosting database services, processing operations received from clientcomputer systems, interfacing with APIs which receive and process clientrequests, interfacing with driver operations, performing operationsassociated with various nodes, for example.

The systems can be distributed among a communication system such as theInternet. One such distributed network, as discussed below with respectto FIG. 13 , can be used to implement various aspects of the invention.

FIG. 13 shows an architecture diagram of an example distributed system1100 suitable for implementing various aspects of the invention. Itshould be appreciated that FIG. 13 is used for illustration purposesonly, and that other architectures can be used to facilitate one or moreaspects of the invention.

System 1100 may include one or more specially configured general-purposecomputer systems distributed among a network 1102 such as, for example,the Internet. Such systems may cooperate to perform the variousfunctions and processes described herein. In an example of one suchsystem, one or more computer systems 1104, 1106, and 1108 are configuredto be nodes in a replica set. The replica set is configured to respondto client requests for database access. In one setting, access to thedatabase occurs through various APIs and associated drivers. In oneexample, client computer systems can interface with computer systems1104-1108 via an Internet-based interface.

In another example, a system 1104 can be accessed through a browserprogram such as the Microsoft Internet Explorer application program,Mozilla's FireFox, or Google's Chrome browser through which one or morewebsites can be accessed. Further, there can be one or more applicationprograms that are executed on system 1104 that perform functionsassociated with responding to client interactions. For example, system1104 may respond to provisioning requests by configuring and deployingvarious data elements across multiple cloud provider systems asdescribed herein.

Network 1102 may also include, as part of a system for managing adistributed database across multiple cloud provider systems, one or moreserver systems, which can be implemented on general-purpose computersthat cooperate to perform various functions and processed describedherein. System 1100 may execute any number of software programs orprocesses on various hardware and the invention is not limited to anyparticular type or number of processes. Such processes can perform thevarious workflows associated with a system for managing read requests.

Certain implementations of database/cloud based systems, can employ anynumber of the following elements. Each of the elements can be configuredto perform the listed functions individually collectively and in variouscombination.

In one embodiment, a system can be configured to perform one or more andany combination of the following processes/functions:

-   -   Allow users to provision a replica set/sharded cluster (e.g., in        MongoDB's Atlas) with electable nodes across N regions in cloud        provider X and read-only/analytics nodes across M regions in        cloud provider Y        -   For example:            -   5 nodes replica set                -   3 electable nodes in AWS US_EAST_1 and US_WEST_1                -   2 read-only nodes in Azure EUROPE_NORTH and Azure                    EUROPE_WEST    -   Allow cloud provider selection via the user interface    -   Allow for instance size selection in the user interface that        accounts for multiple providers.

In some embodiments, the lowest common denominator set of informationavailable for each instance size may be displayed via the userinterface. For example, providing hardware and storage specificationsthat are common and available across the multiple providers

-   -   Allow for cloud provider snapshots backups    -   Allow support for various features of Atlas, such as, but not        limited to:        -   MongoDB Stitch and MongoDB Realm        -   MongoDB Charts        -   Live Migration        -   BI Connector        -   Atlas Online Archive to Atlas Data Lake        -   Encryption at Rest with customer key management    -   Allow support for SRV records for cross-cloud clusters    -   Allow users to provision cross-cloud configurations based on        geographical requirements where data remains within a particular        region, but where availability requirements mean spreading        across multiple providers in that region makes sense. For        example, a preferred region might be Azure UK South, with a        secondary region on Azure UK West and third region on GCP UK    -   Allow users to instantiate election-participating and workload        specific nodes in regions(s) on different cloud providers    -   Allow users to choose their cross-cloud configuration across        multiple providers in a flexible manner (e.g., where its roughly        the same throughput on each cloud/lower common denominator        between them, etc.)    -   Allow users to replicate data across regions in multiple cloud        providers whether for latency or availability purposes    -   Allow users to order regional priority amongst regions of        multiple cloud providers    -   Allow users to target Analytics or Read Only nodes to region(s)        on multiple cloud providers, or on different cloud providers        from the election-participating nodes    -   Allow users to replicate across multiple cloud providers in a        particular country and keep backups in that country    -   Allow users to leverage multiple cloud providers in Zones of a        Global Cluster    -   Allow users to connect to their cluster via different connection        options such as but not limited to public IP whitelisting,        VPC/VNet peering, private endpoints    -   Allow users to migrate from one cloud provider to another        seamlessly    -   Peering and private endpoints—When using peering with a        cross-cloud/multi-cloud cluster, connections are configured by        embodiments to a subset of nodes that match the peering        connection's provider. For example, the system can be configured        to limit peering and private endpoint to occur intra-cloud and        prevent inter cloud peering or private endpoint connections. In        further embodiments, the system can manage a connection that is        attempted to a cluster over peering which has a primary node in        a different cloud provider and enable the connection for        secondary reads    -   Publish DNS records and propagate—Remove cloud-provider specific        subdomains in the DNS records maintained by the system to enable        cross-cloud deployment, e.g., connection strings used to connect        to Azure or GCP such as “abc12.azure.cloud.mongodb.com” or        “xyz12.gcp.cloud.mongodb.com” may be updated to remove the        “.azure” and “.gcp” subdomain    -   Allow encryption of data in a cross-cloud deployment— each cloud        provider offers a key management service (e.g., AWS KMS, Azure        KeyVault, and GCP KMS)— using an abstraction layer, a master key        may be fetched from each individual cloud providers' key        management service. Multiple secondary keys may be derived from        the master keys and the secondary keys may be used to encrypt        data. The benefit of using the abstraction layer is that there        is no reliance on the cloud provider of the key management        solution to be the same cloud provider as the underlying node        that bears data.

Having thus described several aspects and embodiments of this invention,it is to be appreciated that various alterations, modifications andimprovements will readily occur to those skilled in the art. Suchalterations, modifications, and improvements are intended to be part ofthis disclosure, and are intended to be within the spirit and scope ofthe invention. Accordingly, the foregoing description is by way ofexample only.

Use of ordinal terms such as “first,” “second,” “third,” “a,” “b,” “c,”etc., in the claims to modify or otherwise identify a claim element doesnot by itself connote any priority, precedence, or order of one claimelement over another or the temporal order in which acts of a method areperformed, but are used merely as labels to distinguish one claimelement having a certain name from another element having a same name(but for use of the ordinal term) to distinguish the claim elements.

What is claimed is:
 1. A system for managing a distributed databaseacross multiple cloud provider systems, the system comprising: at leastone processor operatively connected to a memory; and a provisioningcomponent, executed by the at least one processor, configured to acceptuser specification of configuration for the distributed database acrossthe multiple cloud provider systems, wherein the provisioning componentis further configured to: receive, via a user interface, a selection ofhardware and storage specifications that are common and available acrossthe multiple cloud provider systems; and configure the distributeddatabase across the multiple cloud provider systems based on the userspecification at least in part by configuring a replica set includingelectable nodes and read-only nodes across the multiple cloud providersystems.
 2. The system of claim 1, wherein the electable nodes aredeployed across one or more geographical regions associated with one ormore cloud provider systems and the read-only nodes are deployed acrossone or more geographic regions associated with the one or more cloudprovider system.
 3. The system of claim 1, wherein configuring thereplica set across the multiple cloud provider systems furthercomprises: deploying at least one electable node at a first cloudprovider system; deploying at least one read-only node at a second cloudprovider system different from the first cloud provider system; anddeploying at least one analytics node provided for workload isolation ata third cloud provider system different from the first cloud providersystem and the second cloud provider system.
 4. The system of claim 3,wherein configuring the replica set across the multiple cloud providersystems further comprises: deploying a first electable node at the firstcloud provider system; and deploying a second electable node at a fourthcloud provider system different from the first cloud provider system. 5.The system of claim 1, wherein configuring the replica set across themultiple cloud provider systems comprises: configuring the replica setacross a first set of cloud provider systems; and reconfiguring thereplica set across a second set of cloud provider systems different fromthe first set.
 6. The system of claim 1, wherein configuring the replicaset across the multiple cloud provider systems comprises: managingnetworking rules that enable communication and/or data transfer betweenthe multiple cloud provider systems.
 7. The system of claim 6, whereinthe networking rules include at least one rule that prevents peeringconnections between the multiple cloud provider systems.
 8. The systemof claim 1, wherein configuring the replica set across the multiplecloud provider systems comprises: managing communication and/or datatransfer between the multiple cloud provider systems using IP accesslists, wherein an IP access list for each cloud provider system includesone or more IP addresses assigned to one or more virtual machinescreated for the respective cloud provider system.
 9. The system of claim8, wherein managing communication and/or data transfer between themultiple cloud provider systems using IP access lists comprises:monitoring updates or changes to a master IP access list; andpropagating the updates or changes to at least one IP access listassociated with at least one of the multiple cloud provider systems. 10.The system of claim 1, further comprising: a communication link betweena first cloud provider system and a second cloud provider system of themultiple cloud provider systems, wherein data communication via thecommunication link is encrypted with TLS (transport layer security). 11.A method for managing a distributed database across multiple cloudprovider systems, the method comprising: using at least one computerhardware processor to perform: receiving a user specification ofconfiguration for the distributed database across the multiple cloudprovider systems, wherein receiving the user specification comprises:receiving, via a user interface, a selection of hardware and storagespecifications that are common and available across the multiple cloudprovider systems; and configuring the distributed database across themultiple cloud provider systems based on the user specification at leastin part by configuring a replica set including electable nodes andread-only nodes across the multiple cloud provider systems.
 12. Themethod of claim 11, further comprising: generating an instantiation ofthe distributed database on the multiple cloud provider systems, whereinthe instantiation of the distributed database includes an allocation ofthe replica set for database operations.
 13. The method of claim 11,wherein configuring the replica set across the multiple cloud providersystems further comprises: deploying at least one electable node at afirst cloud provider system; deploying at least one read-only node at asecond cloud provider system different from the first cloud providersystem; and deploying at least one analytics node provided for workloadisolation at a third cloud provider system different from the firstcloud provider system and the second cloud provider system.
 14. Themethod of claim 11, wherein configuring the replica set across themultiple cloud provider systems comprises: configuring the replica setacross a first set of cloud provider systems; and reconfiguring thereplica set across a second set of cloud provider systems different fromthe first set.
 15. The method of claim 11, wherein configuring thereplica set across the multiple cloud provider systems comprises:managing networking rules that enable communication and/or data transferbetween the multiple cloud provider systems.
 16. The method of claim 15,wherein the networking rules include at least one rule that preventspeering connections between the multiple cloud provider systems.
 17. Themethod of claim 11, wherein configuring the replica set across themultiple cloud provider systems comprises: managing communication and/ordata transfer between the multiple cloud provider systems using IPaccess lists.
 18. The method of claim 17, wherein an IP access list foreach cloud provider system includes one or more IP addresses assigned toone or more virtual machines created for the respective cloud providersystem.
 19. The system of claim 17, wherein managing communicationand/or data transfer between the multiple cloud provider systems usingIP access lists comprises: monitoring updates or changes to a master IPaccess list; and propagating the updates or changes to at least one IPaccess list associated with at least one of the multiple cloud providersystems.
 20. At least one non-transitory computer readable storagemedium storing processor-executable instructions that, when executed byat least one hardware processor, cause the at least one hardwareprocessor to perform a method for managing a distributed database acrossmultiple cloud provider systems, the method comprising: receiving a userspecification of configuration for the distributed database across themultiple cloud provider systems, wherein receiving the userspecification comprises: receiving, via a user interface, a selection ofhardware and storage specifications that are common and available acrossthe multiple cloud provider systems; and configuring the distributeddatabase across the multiple cloud provider systems based on the userspecification at least in part by configuring a replica set includingelectable nodes and read-only nodes across the multiple cloud providersystems.